Lab Assignment Three: Extending Logistic Regression
Richmond Aisabor
Wine is an alcoholic drink typically made from fermented grape juice. Different varieties of yeasts and grapes produce different styles of wine. Vinho Verde refers to wine originating from northern Portugal. Many countries have legal appellations that restrict the geographical origin, permitted grape varieties and other aspects of wine production. Vinho Verde is not a grape variety, but a protected designation of origin for the production of wine. Vinho Verde comes in red, white and rosé styles, however this analysis will focus on red vinho verde wine. The dataset contains a set of red vinho verde wine samples and the goal is to classify the wine samples by quality based on physiochemical features.
The classification model for red wine quality could be of interest to those seeking to create the perfect wine. This dataset includes 11 features, but some of these features may not be relevant to determining quality. Through feature selection, it becomes clear what exactly a winemaker should focus on when cultivating grapes to produce wine. Since this dataset presents a unique opportunity for the research and development side of the wine industry, this model is most useful for offline analysis.
Dataset source: https://archive.ics.uci.edu/ml/datasets/Wine+Quality
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")
df = pd.read_csv('winequality-red.csv', sep=';')
# replace white spaces with an underscore'
df = df.rename(columns = {'fixed acidity': 'fixed_acidity', 'volatile acidity': 'volatile_acidity', 'citric acid': 'citric_acid', 'residual sugar': 'residual_sugar', 'free sulfur dioxide': 'free_sulfur_dioxide', 'total sulfur dioxide': 'total_sulfur_dioxide'})
df.head()
| fixed_acidity | volatile_acidity | citric_acid | residual_sugar | chlorides | free_sulfur_dioxide | total_sulfur_dioxide | density | pH | sulphates | alcohol | quality | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 7.4 | 0.70 | 0.00 | 1.9 | 0.076 | 11.0 | 34.0 | 0.9978 | 3.51 | 0.56 | 9.4 | 5 |
| 1 | 7.8 | 0.88 | 0.00 | 2.6 | 0.098 | 25.0 | 67.0 | 0.9968 | 3.20 | 0.68 | 9.8 | 5 |
| 2 | 7.8 | 0.76 | 0.04 | 2.3 | 0.092 | 15.0 | 54.0 | 0.9970 | 3.26 | 0.65 | 9.8 | 5 |
| 3 | 11.2 | 0.28 | 0.56 | 1.9 | 0.075 | 17.0 | 60.0 | 0.9980 | 3.16 | 0.58 | 9.8 | 6 |
| 4 | 7.4 | 0.70 | 0.00 | 1.9 | 0.076 | 11.0 | 34.0 | 0.9978 | 3.51 | 0.56 | 9.4 | 5 |
# find the data type
print(df.info())
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1599 entries, 0 to 1598 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 fixed_acidity 1599 non-null float64 1 volatile_acidity 1599 non-null float64 2 citric_acid 1599 non-null float64 3 residual_sugar 1599 non-null float64 4 chlorides 1599 non-null float64 5 free_sulfur_dioxide 1599 non-null float64 6 total_sulfur_dioxide 1599 non-null float64 7 density 1599 non-null float64 8 pH 1599 non-null float64 9 sulphates 1599 non-null float64 10 alcohol 1599 non-null float64 11 quality 1599 non-null int64 dtypes: float64(11), int64(1) memory usage: 150.0 KB None
Based on the dataframe information, no missing values are found in the dataset as there are 1599 entries in each feature (1599 rows in the data), however if there were missing data I could use median and mode to impute the values since all of the features are numerical.
# find the data summary
df.describe()
| fixed_acidity | volatile_acidity | citric_acid | residual_sugar | chlorides | free_sulfur_dioxide | total_sulfur_dioxide | density | pH | sulphates | alcohol | quality | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 1599.000000 | 1599.000000 | 1599.000000 | 1599.000000 | 1599.000000 | 1599.000000 | 1599.000000 | 1599.000000 | 1599.000000 | 1599.000000 | 1599.000000 | 1599.000000 |
| mean | 8.319637 | 0.527821 | 0.270976 | 2.538806 | 0.087467 | 15.874922 | 46.467792 | 0.996747 | 3.311113 | 0.658149 | 10.422983 | 5.636023 |
| std | 1.741096 | 0.179060 | 0.194801 | 1.409928 | 0.047065 | 10.460157 | 32.895324 | 0.001887 | 0.154386 | 0.169507 | 1.065668 | 0.807569 |
| min | 4.600000 | 0.120000 | 0.000000 | 0.900000 | 0.012000 | 1.000000 | 6.000000 | 0.990070 | 2.740000 | 0.330000 | 8.400000 | 3.000000 |
| 25% | 7.100000 | 0.390000 | 0.090000 | 1.900000 | 0.070000 | 7.000000 | 22.000000 | 0.995600 | 3.210000 | 0.550000 | 9.500000 | 5.000000 |
| 50% | 7.900000 | 0.520000 | 0.260000 | 2.200000 | 0.079000 | 14.000000 | 38.000000 | 0.996750 | 3.310000 | 0.620000 | 10.200000 | 6.000000 |
| 75% | 9.200000 | 0.640000 | 0.420000 | 2.600000 | 0.090000 | 21.000000 | 62.000000 | 0.997835 | 3.400000 | 0.730000 | 11.100000 | 6.000000 |
| max | 15.900000 | 1.580000 | 1.000000 | 15.500000 | 0.611000 | 72.000000 | 289.000000 | 1.003690 | 4.010000 | 2.000000 | 14.900000 | 8.000000 |
# create a data description table
data_des = pd.DataFrame()
pd.set_option('display.max_colwidth',1000)
data_des['Features'] = df.columns
data_des['Description'] = ['concentration of titratable acids', 'concentration of acetic acids, indicator of spoilage',
'concetration of citric acid, acts as preservative used to increase acidity', 'concetration of natural grape sugars, indicator of sweetness',
'amount of chlorides present in wine, indicates saltiness', 'concentration of free sulfur dioxide, acts as a sanitizing and preserving agent',
'concentration of total sulfur dioxide, acts as a sanitizing and preserving agent', 'measure of alcohol concentration and sweetness',
'concetration of hydrogen ion, differentiates between acidic and basic', 'measure of salts of sulphuric acid, indicator of sharpness', 'concentration of alcohol', 'measure of wine quality']
data_des['Scales'] = ['ratio'] * 11 + ['ordinal']
data_des['Discrete\Continuous'] = ['continuous'] * 5 + ['discrete'] * 2 + ['continuous'] * 4 + ['discrete']
data_des['Range'] = ['4.60 - 15.90', '0.12 - 1.58', '0.00 - 1.00', '0.90 - 15.50', '0.01 - 0.61', '1 - 72', '6 - 289', '0.99 - 1', '2.74 - 4.01', '0.33 - 2.00', '8.40 - 14.90', '3 - 8']
data_des
| Features | Description | Scales | Discrete\Continuous | Range | |
|---|---|---|---|---|---|
| 0 | fixed_acidity | concentration of titratable acids | ratio | continuous | 4.60 - 15.90 |
| 1 | volatile_acidity | concentration of acetic acids, indicator of spoilage | ratio | continuous | 0.12 - 1.58 |
| 2 | citric_acid | concetration of citric acid, acts as preservative used to increase acidity | ratio | continuous | 0.00 - 1.00 |
| 3 | residual_sugar | concetration of natural grape sugars, indicator of sweetness | ratio | continuous | 0.90 - 15.50 |
| 4 | chlorides | amount of chlorides present in wine, indicates saltiness | ratio | continuous | 0.01 - 0.61 |
| 5 | free_sulfur_dioxide | concentration of free sulfur dioxide, acts as a sanitizing and preserving agent | ratio | discrete | 1 - 72 |
| 6 | total_sulfur_dioxide | concentration of total sulfur dioxide, acts as a sanitizing and preserving agent | ratio | discrete | 6 - 289 |
| 7 | density | measure of alcohol concentration and sweetness | ratio | continuous | 0.99 - 1 |
| 8 | pH | concetration of hydrogen ion, differentiates between acidic and basic | ratio | continuous | 2.74 - 4.01 |
| 9 | sulphates | measure of salts of sulphuric acid, indicator of sharpness | ratio | continuous | 0.33 - 2.00 |
| 10 | alcohol | concentration of alcohol | ratio | continuous | 8.40 - 14.90 |
| 11 | quality | measure of wine quality | ordinal | discrete | 3 - 8 |
import seaborn as sns
import matplotlib.pyplot as plt
cmap = sns.diverging_palette(220, 10, as_cmap=True)
#plot the correlation matrix using seaborn
sns.set(style="darkgrid") # one of the many styles to plot using
f, ax = plt.subplots(figsize=(9, 9))
sns.heatmap(df.corr(), cmap=cmap, annot=True)
f.tight_layout()
plt.title('Correlation Matrix Graph')
Text(0.5, 1.0, 'Correlation Matrix Graph')
To reduce the dimensionality of this dataset, I will use this correlation matrix to determine which features highly correlate with one another. Some of the features in the dataset serve the same purpose in winemaking as shown in the data description table. The correlation between features that serve the same purpose for winemakers require special consideration because they are most likely to correlate.
Density, fixed acidity and citric acid show a strong positive correlation. Total sulfur dioxide and free sulfur dioxide also show a strong positive correlation. Fixed acidity and pH show a strong negative correlation.
import copy
# remove columns that are not useful
df_reshape = copy.deepcopy(df)
df_reshape.drop(columns=['free_sulfur_dioxide', 'pH'], axis=1, inplace=True)
df_reshape.head()
| fixed_acidity | volatile_acidity | citric_acid | residual_sugar | chlorides | total_sulfur_dioxide | density | sulphates | alcohol | quality | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 7.4 | 0.70 | 0.00 | 1.9 | 0.076 | 34.0 | 0.9978 | 0.56 | 9.4 | 5 |
| 1 | 7.8 | 0.88 | 0.00 | 2.6 | 0.098 | 67.0 | 0.9968 | 0.68 | 9.8 | 5 |
| 2 | 7.8 | 0.76 | 0.04 | 2.3 | 0.092 | 54.0 | 0.9970 | 0.65 | 9.8 | 5 |
| 3 | 11.2 | 0.28 | 0.56 | 1.9 | 0.075 | 60.0 | 0.9980 | 0.58 | 9.8 | 6 |
| 4 | 7.4 | 0.70 | 0.00 | 1.9 | 0.076 | 34.0 | 0.9978 | 0.56 | 9.4 | 5 |
Total sulfur dioxide and free sulfur dioxide are used to kill unwanted bacteria and preserve wine. Keeping both features is redundant so it is safe to remove one. Since total sulfur dioxide explains how much free sulfur dioxide is in the wine and how much is bound to other chemicals, it is best to remove the free sulfur dioxide feature from the dataset.
Although density, fixed acidity and citric acidity show strong correlations, they each explain a unique aspect of the wine so these features will stay.
pH and fixed acidity inversely correlate, but they both measure acidity. The dynamic range of pH values are 2.74 - 4.01 and any solution below a 7 is conscidered acidic. This shows that every wine sample is acidic but not much else. Since fixed acidity is a measure of titratable acids and gives an estimate of the total concentration of acid in each sample, this is a better measure. Thus it is best to remove pH because it does not do a great job of explaining acidity.
plt.figure(figsize=(10,7))
sns.distplot(df_reshape.quality)
plt.title('Distribution of Quality')
Text(0.5, 1.0, 'Distribution of Quality')
The quality distribution plot shows a bimodal distribution because there are two peaks. The two peaks are at a quality of 5 and 6. This means that a random wine sample has the highest probabilty of being classified as a 5 or 6. This information helps classify the dataset even further and divide the quality values into 3 ranges. Seeing that many wine samples have a quality of 5 or 6, this range can be set as the middle range, values greater than 6 can be the upper range and values less than 5 can be the lower range. The wine samples will be divided into three classes:
df_reshape['quality'] = pd.cut(df_reshape.quality,[0,4,6,10], labels=['poor','good','excellent'])
# replace quality score to numrical indicator
df_reshape.quality.replace(to_replace = ['poor', 'good', 'excellent'],
value = {1: 'poor', 2: 'good', 3: 'excellent'}, inplace = True)
df_reshape.head()
| fixed_acidity | volatile_acidity | citric_acid | residual_sugar | chlorides | total_sulfur_dioxide | density | sulphates | alcohol | quality | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 7.4 | 0.70 | 0.00 | 1.9 | 0.076 | 34.0 | 0.9978 | 0.56 | 9.4 | 2 |
| 1 | 7.8 | 0.88 | 0.00 | 2.6 | 0.098 | 67.0 | 0.9968 | 0.68 | 9.8 | 2 |
| 2 | 7.8 | 0.76 | 0.04 | 2.3 | 0.092 | 54.0 | 0.9970 | 0.65 | 9.8 | 2 |
| 3 | 11.2 | 0.28 | 0.56 | 1.9 | 0.075 | 60.0 | 0.9980 | 0.58 | 9.8 | 2 |
| 4 | 7.4 | 0.70 | 0.00 | 1.9 | 0.076 | 34.0 | 0.9978 | 0.56 | 9.4 | 2 |
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
knn = KNeighborsClassifier(n_neighbors=1)
# create two data frames seperated by data and target
X = copy.deepcopy(df_reshape)
X.drop(columns=['quality'], axis=1, inplace=True)
X = StandardScaler().fit_transform(X) #normalization
y = df_reshape['quality'].copy()
y = y.values
x_train,x_test,y_train,y_test=train_test_split(X,y,test_size=0.2, train_size=0.8, random_state=123)
knn.fit(x_train,y_train)
acc = accuracy_score(knn.predict(x_test),y_test)
print(f"accuracy:{100*acc:.2f}%".format())
accuracy:81.25%
The wine quality dataet uses 1599 instances, so it is a medium dataset. If the dataset had about 100,000 instances, it would be considered large and if the dataset had only 100 instances, it would be considered small. The goal in splitting the data should be to allow there to be enough data to train to achieve good generalization when introducing new data to classify. Therefore the number of instances is important to deciding how the data should split. With an 80/20 split, there would be more than enough instances to train classifier and not to few instances to test the accuracy of the classifier. In an 80/20 split, about 1279 instances are trained, which is still considered a medium dataset and the classifier will classify about 320 instances.
%%time
# now lets do some vectorized coding
import numpy as np
from scipy.special import expit
from numpy.linalg import pinv
import sys
class VectorBinaryLogisticRegression():
#private:
def __init__(self, eta, iterations=20, C1=0.001, C2=0.001, solver="steepest", regularize="none"):
self.eta = eta
self.iters = iterations
self.C1 = C1
self.C2 = C2
self.solver = solver # options are steepest, stochastic and newton
self.regularize = regularize
# internally we will store the weights as self.w_ to keep with sklearn conventions
#private:
def __str__(self):
if(hasattr(self,'w_')):
return 'Vector Binary Logistic Regression Object with coefficients:\n'+ str(self.w_) # is we have trained the object
else:
return 'Untrained Vector Binary Logistic Regression Object'
@staticmethod
def _add_bias(X):
return np.hstack((np.ones((X.shape[0],1)),X)) # add bias term
@staticmethod
def _sigmoid(theta):
# increase stability, redefine sigmoid operation
return expit(theta) #1/(1+np.exp(-theta))
# public:
def predict_proba(self, X, add_bias=True):
# add bias term if requested
Xb = self._add_bias(X) if add_bias else X
return self._sigmoid(Xb @ self.w_) # return the probability y=1
# public
def regular_penalty(self, grad):
d_arr = np.divide(np.asarray(self.w_), np.asarray(np.absolute(self.w_)))
d_arr = d_arr[:,~np.isnan(d_arr).all(0)]
d_arr = d_arr[~np.isnan(d_arr).all(1)]
if(self.regularize == "L2"):
grad[1:] += -2 * self.w_[1:] * self.C1
elif(self.regularize == "L1"):
if(d_arr.size != 0):
grad[1:] += d_arr[1:] * self.C1
elif(self.regularize == "L1+L2"):
grad[1:] += -2 * self.w_[1:] * self.C1
if(d_arr.size != 0):
grad[1:] += d_arr[1:] * self.C2
else:
sys.exit("Could not understand")
return grad
# vectorized gradient calculation
def _get_gradient(self,X,y):
if(self.solver == "steepest"):
ydiff = y-self.predict_proba(X,add_bias=False).ravel() # get y difference
gradient = np.mean(X * ydiff[:,np.newaxis], axis=0) # make ydiff a column vector and multiply through
gradient = gradient.reshape(self.w_.shape)
gradient = self.regular_penalty(gradient)
elif(self.solver == "stochastic"):
idx = int(np.random.rand()*len(y)) # grab random instance
ydiff = y[idx]-self.predict_proba(X[idx],add_bias=False) # get y difference (now scalar)
gradient = X[idx] * ydiff[:,np.newaxis] # make ydiff a column vector and multiply through
gradient = gradient.reshape(self.w_.shape)
gradient = self.regular_penalty(gradient)
elif(self.solver == "hessian"):
g = self.predict_proba(X,add_bias=False).ravel() # get sigmoid value for all classes
hessian = X.T @ np.diag(g*(1-g)) @ X # calculate the hessian
ydiff = y-g # get y difference
gradient = np.sum(X * ydiff[:,np.newaxis], axis=0) # make ydiff a column vector and multiply through
gradient = gradient.reshape(self.w_.shape)
gradient = self.regular_penalty(gradient)
gradient = pinv(hessian) @ gradient
else:
sys.exit("Could not understand")
return gradient
# public:
def fit(self, X, y):
Xb = self._add_bias(X) # add bias term
num_samples, num_features = Xb.shape
self.w_ = np.zeros((num_features,1)) # init weight vector to zeros
# for as many as the max iterations
for _ in range(self.iters):
gradient = self._get_gradient(Xb,y)
self.w_ += gradient*self.eta # multiply by learning rate
CPU times: user 46 µs, sys: 1 µs, total: 47 µs Wall time: 49.8 µs
class LogisticRegression:
def __init__(self, eta, iterations=20, C1=0.001, C2=0.001, solver="steepest", regularize="none"):
self.eta = eta
self.iters = iterations
self.C1 = C1
self.C2 = C2
self.solver = solver
self.regularize = regularize
# internally we will store the weights as self.w_ to keep with sklearn conventions
def __str__(self):
if(hasattr(self,'w_')):
return 'MultiClass Logistic Regression Object with coefficients:\n'+ str(self.w_) # is we have trained the object
else:
return 'Untrained MultiClass Logistic Regression Object'
def fit(self,X,y):
num_samples, num_features = X.shape
self.unique_ = np.unique(y) # get each unique class value
num_unique_classes = len(self.unique_)
self.classifiers_ = [] # will fill this array with binary classifiers
for i,yval in enumerate(self.unique_): # for each unique value
y_binary = (y==yval) # create a binary problem
# train the binary classifier for this class
blr = VectorBinaryLogisticRegression(self.eta,
self.iters,
self.C1,
self.C2,
self.solver,
self.regularize)
blr.fit(X,y_binary)
# add the trained classifier to the list
self.classifiers_.append(blr)
# save all the weights into one matrix, separate column for each class
self.w_ = np.hstack([x.w_ for x in self.classifiers_]).T
def predict_proba(self,X):
probs = []
for blr in self.classifiers_:
probs.append(blr.predict_proba(X)) # get probability for each classifier
return np.hstack(probs) # make into single matrix
def predict(self,X):
return self.unique_[np.argmax(self.predict_proba(X),axis=1)] # take argmax along row
lr = LogisticRegression(0.1,1500)
print(lr)
Untrained MultiClass Logistic Regression Object
import seaborn as sns
import random
import pandas as pd
import plotly.express as px
data = np.zeros((3,3))
u = 0
penalty1 = []
penalty2 = []
x_train,x_test,y_train,y_test=train_test_split(X,y,test_size=0.2, train_size=0.8, random_state=123)
for method in ['steepest', 'stochastic', 'hessian']:
pen = random.uniform(0, 1)
i = 0
for regVal in ['L1', 'L2', 'L1+L2']:
lr = LogisticRegression(eta=0.05, iterations=500, C1=pen, C2=pen, solver=method, regularize=regVal)
lr.fit(x_train,y_train)
yhat = lr.predict(x_test)
data[u][i] = accuracy_score(yhat, y_test)*100
penalty1.append(round(pen, 2))
penalty2.append(round(pen,2))
i+=1
u+=1
df_data = pd.DataFrame(data={'Method': ['Steepest','Stochastic','Newton'],
'L1': [data[0][0], data[1][0], data[2][0]],
'L2': [data[0][1], data[1][1], data[2][1]],
'L1+L2': [data[0][2], data[1][2], data[2][2]]})
df_data= pd.melt(df_data, id_vars="Method", var_name='Regularization', value_name='Accuracy')
df_data["C1"] = penalty1
df_data["C2"] = penalty2
print(df_data)
ax = px.bar(df_data, x='Method', y='Accuracy', color='Accuracy', hover_data=["C1", "C2", "Regularization"])
ax.show()
Method Regularization Accuracy C1 C2 0 Steepest L1 63.1250 0.27 0.27 1 Stochastic L1 54.0625 0.27 0.27 2 Newton L1 82.1875 0.27 0.27 3 Steepest L2 81.8750 0.80 0.80 4 Stochastic L2 81.8750 0.80 0.80 5 Newton L2 82.1875 0.80 0.80 6 Steepest L1+L2 82.5000 0.65 0.65 7 Stochastic L1+L2 81.8750 0.65 0.65 8 Newton L1+L2 82.1875 0.65 0.65
The best performing classifier is the stepest-gradient descent method using elastic-net regularization. The penalty value for C1 and C2 is 0.65 and this value was randomly selected within a dynamic range between 0 and 1. The graph measures the accuracies of each optimazation technique side by side, the color pattern distinguishes between the regularization methods and the randomly selected penalties are displayed in the hover text box.
This method avoids data snooping by selecting the penalties at random. The method does not take the accuracy score from a previous iteration of an optimization technique and use this knowledge to find optimal hyperparameters. Since the hyperparameter values are chosen at random, they can not be used to overfit the regression to the test set.
import time
from sklearn.linear_model import LogisticRegression as SKLogisticRegression
compare = np.zeros((15,4))
x_train,x_test,y_train,y_test=train_test_split(X,y,test_size=0.2, train_size=0.8, random_state=123)
lr_sk = SKLogisticRegression(solver='liblinear') # all params default
lr = LogisticRegression(eta=0.05, iterations=500, solver='steepest',
C1=0.65, C2=0.65, regularize='L1+L2') #choose best optimzation technique
for k in range(2):
if k == 0:
lr_sk.fit(x_train,y_train) #fit skikit-learn regression model
else:
lr.fit(x_train,y_train) #fit custom regression model
for i in range(15):
if k == 0:
start_time = time.time()
yhat = lr_sk.predict(x_test)
compare[i][k] = accuracy_score(yhat, y_test)*100
compare[i][2] = (time.time() - start_time)*1000000 #convert to microseconds
else:
start_time = time.time()
yhat = lr.predict(x_test)
compare[i][k] = accuracy_score(yhat, y_test)*100
compare[i][3] = (time.time() - start_time)*1000000 #convert to microsecons
compare_data = pd.DataFrame(dict(CustomTime=compare[:, 3], SkTime=compare[:, 2], CustomAccuracy=compare[:, 0], SkAccuracy=compare[:, 1] ))
print(compare_data)
fig = px.scatter(compare_data, hover_data=["CustomAccuracy", "SkAccuracy"], trendline="lowess")
fig.update_xaxes(title_text='Execution')
fig.update_yaxes(title_text='Time (μs)')
fig.show()
CustomTime SkTime CustomAccuracy SkAccuracy 0 304.937363 474.214554 82.1875 82.5 1 225.067139 279.903412 82.1875 82.5 2 216.007233 254.869461 82.1875 82.5 3 214.099884 250.101089 82.1875 82.5 4 209.093094 256.061554 82.1875 82.5 5 206.708908 240.802765 82.1875 82.5 6 207.185745 237.703323 82.1875 82.5 7 206.947327 231.981277 82.1875 82.5 8 205.993652 231.027603 82.1875 82.5 9 205.278397 231.981277 82.1875 82.5 10 205.993652 231.027603 82.1875 82.5 11 205.039978 229.835510 82.1875 82.5 12 204.086304 230.073929 82.1875 82.5 13 205.039978 232.219696 82.1875 82.5 14 205.993652 231.981277 82.1875 82.5
The custom method is able to train and classify the test set much quicker than the skikit-learn method. The accuracy scores are also higher using the custom method, however only marginally. The custom model uses elastic-net regularization to introduce bias when fitting the training data, which reduces variance when fitting on the testing data. The result is a greater accuracy for subsequent predictions in the long run. Elastic-net regularization uses L1 and L2 regularization to optimize the technique so features that do not add much value to the prediction are removed and the magnitudes of the important features never become too large. Elastic-net regularization resulted in the best accuracy scores for steepest-descent and newton's method so this regularization technique must have made a great contribution to this performance.
The implementation that should be deployed is the skikit-learn implementation. Although the custom implementation is faster, the gains in accuracy are not enough to justify deploying it over skikit-learn's logistic regression model. The skikit-learn implementation is more user friendly because there is only one required paramater so the user will not have to spend time adjusting too many hyperparameters. This is important becuause the primary users of this model will be wine experts. The time of a wine expert is better spent simply executing the model to explore wine and not tuning hyperparameters. Since this model is meant for offline analysis, execution time is not as important. This model will not be put in a high risk application like a self-driving car, so speed is not as important given the application is to further the study of wine.